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We claim: 

1 . A programmable processor comprising: 
an instruction path; 
a data path; 

5 an external interface operable to receive data from an external source and communicate 

the received data over the data path; 

a cache operable to retain data communicated between the external interface and the data 

path; 

a register file operable to receive and store data from the data path and conmiunicate the 

10 stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to decode 
and execute instructions received from the instruction path, wherein in response to decoding a 
single instruction specifying a data selection operand and a first and a second register each 
having a register width, the first and second registers providing a plurality of data elements each 

15 having an elemental width smaller than the register width of the first and second registers, the 
data selection operand comprising a plurality of fields each selecting one of the plurality of data 
elements, the execution unit is operable to provide the data element selected by each field of the 
data selection operand to a predetermined position in a catenated result. 

20 2. The processor of claim 1 wherein each field of the data selection operand provides a 
sufficient number of bits to specify any one of the plurality of data elements. 

3. The processor of claim 2 wherein each field of the data selection operand has a width of n 
bits wherein the plurality of data elements comprises 2" data elements. 
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4. The processor of claim 1 wherein the data selection operand is provided by a register 
specified by the single instruction . 

5. The processor of claim 4 wherein the data selection operand has a width equal to the 
specified register width. 

6. The processor of claim 1 wherein the catenated result is provided to a register. 

7. The processor of claim 1 wherein the plurality of data elements has a combined width 
equal to the width of the first register plus the width of the second register. 

8. The processor of claim 1 wherein the instruction further specifies a data element width of 
the plurality of data elements. 

9. The processor of claim 1 wherein each data element has a width of 8 bits. 

10. The processor of claim 1 wherein the catenated result has a width of 128 bits. 

1 1 . The processor of claim 1 wherein for each field of the data selection operand, a relative 
location of the field within the data selection operand corresponds to a relative location of the 
predetermined position within the catenated result. 
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12. The processor of claim 1 wherein the execution unit is further operable to, in response to 
decoding a second single instruction specifying a third and a fourth register each containing a 
plurality of operands, multiply the plurahty of floating point operands in the third register by the 
plurality of operands in the fourth register to produce a plurahty of products and provide the 

5 plurahty of products to partitioned fields of a resuU register as a second catenated result. 

13. A programmable processor comprising: 
an instruction path; 

a data path; 

10 an external interface operable to receive data from an extemal source and communicate 

the received data over the data path; 

a cache operable to retain data communicated between the extemal interface and the data 

path; 

a register file operable to receive and store data from the data path and communicate the 

1 5 stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to decode 
and execute instructions received from the instruction path, wherein in response to decoding a 
single instruction specifying a data selection operand and a register having a register width, the 
register providing a plurahty of data elements each having an elemental width smaller than the 

20 register width of the register, the data selection operand comprising a plurality of fields each 
selecting one of the plurality of data elements, the execution unit is operable to provide the data 
element selected by each field of the data selection operand to a predetermined position in a 
catenated result. 
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14. A data processing system comprising: 

(a) a bus coupling components in the data processing system; 

(b) an external memory coupled to the bus; 

(c) a programmable microprocessor coupled to the bus and capable of operation 
independent of another host processor, the microprocessor comprising: 

an instruction path; 
a data path; 

an external interface operable to receive data from an external source and communicate 
the received data over the data path; 

a cache operable to retain data conununicated between the external interface and the data 

path; 

a register file operable to receive and store data from the data path and communicate the 
stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to decode 
and execute instructions received from the instruction path, wherein in response to decoding a 
single instruction specifying a data selection operand and a first and a second register each 
having a register width, the first and second registers providing a plurality of data elements each 
having an elemental width smaller than the register width of the first and second registers, the 
data selection operand comprising a plurality of fields each selecting one of the plurality of data 
elements, the execution unit is operable to provide the data element selected by each field of the 
data selection operand to a predetermined position in a catenated result. 
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15. The system of claim 14 wherein each field of the data selection operand provides a 
sufficient number of bits to specify any one of the plurality of data elements. 

16. The system of claim 15 wherein each field of the data selection operand has a width of n 
5 bits wherein the plurality of data elements comprises 2" data elements. 

17. The system of claim 14 wherein the data selection operand is provided by a register 
specified by the single instruction . 

10 18. The system of claim 1 7 wherein the data selection operand has a width equal to the 
specified register width. 

19. The system of claim 14 wherein the catenated resuh is provided to a register. 

1 5 20. The system of claim 14 wherein the plurality of data elements has a combined width 
equal to the width of the first register plus the width of the second register. 

21 . The system of claim 14 wherein the instruction fiirther specifies a data element width of 
the plurality of data elements. 

20 

22. The system of claim 14 wherein each data element has a width of 8 bits. 

23. The system of claim 14 wherein the catenated result has a width of 128 bits. 
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24. The system of claim 14 wherein for each field of the data selection operand, a relative 
location of the field within the data selection operand corresponds to a relative location of the 
predetermined position within the catenated result. 

25. The system of claim 14 wherein the execution unit is further operable to, in response to 
decoding a second single instruction specifying a third and a fourth register each containing a 
plurality of operands, multiply the plurality of floating point operands in the third register by the 
plurality of operands in the fourth register to produce a plurality of products and provide the 
plurality of products to partitioned fields of a result register as a second catenated result. 

26. A data processing system comprising: 

(a) a bus coupling components in the data processing system; 

(b) an external memory coupled to the bus; 

15 (c) a programmable microprocessor coupled to the bus and capable of operation 

independent of another host processor, the microprocessor comprising: 
an instruction path; 
a data path; 

an extemal interface operable to receive data from an external source and communicate 
20 the received data over the data path; 

a cache operable to retain data communicated between the extemal interface and the data 

path; 

a register file operable to receive and store data from the data path and communicate the 
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stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to decode 
and execute instructions received from the instruction path, wherein in response to decoding a 
single instruction specifying a data selection operand and a register having a register width, the 
register providing a plurality of data elements each having an elemental width smaller than the 
register width of the register, the data selection operand comprising a plurality of fields each 
selecting one of the plurality of data elements, the execution unit is operable to provide the data 
element selected by each field of the data selection operand to a predetermined position in a 
catenated result. 
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